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Abstract 

Natural polymers are able to self-assemble into versatile nanostructures based on the information encoded into their 
primary structure. The structural richness of biopolymer-based nanostructures depends on the information content of 
building blocks and the available biological machinery to assemble and decode polymers with a defined sequence. 
Natural polypeptides comprise 20 amino acids with very different properties in comparison to only 4 structurally similar 
nucleotides, building elements of nucleic acids. Nevertheless the ease of synthesizing polynucleotides with selected 
sequence and the ability to encode the nanostructural assembly based on the two specific nucleotide pairs underlay the 
development of techniques to self-assemble almost any selected three-dimensional nanostructure from polynucleotides. 
Despite more complex design rules, peptides were successfully used to assemble symmetric nanostructures, such as fibrils 
and spheres. While earlier designed protein-based nanostructures used linked natural oligomerizing domains, recent 
design of new oligomerizing interaction surfaces and introduction of the platform for topologically designed protein fold 
may enable polypeptide-based design to follow the track of DNA nanostructures. The advantages of protein-based 
nanostructures, such as the functional versatility and cost effective and sustainable production methods provide strong 
incentive for further development in this direction. 
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Introduction 

The versatility of biopolymers can be used to rationally de- 
sign new molecules and assemblies with structures and 
functionalities unseen in nature. The ability of biopolymers 
to self-assemble into complex shapes and structures defined 
at the nanometer scale, and our competence of sustainable 
large-scale production using cell factories makes them 
highly desirable for diverse technological applications. In 
the rapidly-growing research area of modern nanobiotech- 
nology the natural components polypeptides and nucleic 
acids have been employed as building blocks for the assem- 
bling of new designed nanostructures and nanomaterials. 
Bionanotechnologists have in the last decades achieved im- 
portant advances in protein-based and particularly DNA- 
based responsive nanostructures, which can now be designed 
to self-assemble into almost any selected shape. 

Molecular self-assembly as the main organizing principle 
of biological systems is also a widely applied strategy in the 
nanotechnology as the driving force for the assembly of 
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artificial nanostructures. In self-assembly the final structure 
is encoded by interactions of its building elements defined 
by their properties and the order of building blocks within 
the linear polymer. The shapes and fianctions of both, 
DNA- and protein-based nanostructures are encoded by 
the sequence of their constituents, nucleotides and amino 
acids. Additionally, the architecture of both type of the 
nanostructures can be affected also by the environmental 
factors, such as solvent, pH, temperature and building 
blocks concentration. 

DNA nanostructures are based on the Watson-Crick 
nucleic base complementarity. There are only two differ- 
ent base pairs based on a specific pairwise interaction, 
where stacking with neighboring pairs underlies the for- 
mation of stable double-helical domains that serve as 
the nanostructural building blocks. Some of the most 
spectacular examples of the potentials of nanobiotech- 
nology have been demonstrated by DNA-based nano- 
structures. In the nature the primary function of nucleic 
acids are the storage, processing and mediation of gen- 
etic information; however natural structures such as 
aptameres, telomeres and partially the ribosome as one 
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of the key and most complex nanodevices are formed by 
nucleic acids assembled into 3D structures. The rele- 
vance of the physiological role of nucleic acids that per- 
form their function in form of self-assembled noncoding 
RNA transcripts is still unknown. On the other hand 
artificial rationally designed DNA nanostructures, which 
utilize a narrower subset of interactions from aptameres, 
can adopt a huge diversity of 2D or 3D shapes [1-5]. 

In contrast to designed DNA nanostructures, the ra- 
tional design of protein nanostructures is much more 
complicated due to the complex cooperative interac- 
tions between amino acids stabilizing the fold of native 
proteins. The comparison of some features of self- 
assembled DNA- and protein nanostructures is pre- 
sented in Figure 1. Structural folding of most natural 
proteins still cannot be easily predicted from their primary 
structure due to contribution of many cooperative and 
long-range interactions between amino acids, therefore 
de novo design of completely new protein folds is even 
more challenging. 

However, a significant progress has been recently 
achieved in the development of strategies for building 
artificial self-assembled bionanostructures, and a range 
of both, DNA- and protein nanostructures rapidly in- 
creased in last two decades. In this review we mainly 
focus on protein-based nanostructure strategies, while 
DNA nanotechnology has been discussed in detail in 
many recent reviews [6-12]. 



Designed DNA nanostructures 

In 1982, Seeman proposed to use DNA as the structural 
material for the bottom-up self-assembly [13] and he is 
accepted as the founder of the field of DNA nanotech- 
nology. Since then, DNA-based self-assembly achieved 
spectacular results relying on the base-pairing specificity 
of nucleotides, using DNA synthesis technology, com- 
puter based design and, above all, imaginative design. 
Over the last three decades self-assembled DNA nano- 
structures have been extensively studied and several dif- 
ferent approaches for building DNA nanostructures have 
been developed. Self-assembled DNA nanostructures 
range from 3D structures with a well-defined shape 
[2,4,14-17] to a variety of complex dynamic DNA de- 
vices [8,18-20]. This avenue of research also spawned 
DNA computing [21,22] and design of dynamic devices 
[8,23,24], which are however beyond the scope of this 
review. 

DNA self-assembly is a robust and flexible biomimetic 
strategy for molecular construction that is directed by 
the information embodied in the nucleotide sequence. 
Development of DNA nanostructures encompasses sev- 
eral different approaches (Figure 2), where the design of 
nanostructures is based on the assembly of: 

- several medium-sized DNA (few 10-100 nucleo- 
tides) oligonucleotides that form finite sized nano- 
structures [14]; 
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Figure 1 Some features of self-assembled DNA- and protein nanostructures. Natural proteins comprise 20 amino acid residues witli diverse 
properties in comparison to only 4 structurally similar nucleotides, building elements of nucleic acids. The advantages of protein nanostructures 
include also cheaper manufacturing of building blocks, as well as the multiple cooperative interactions that govering protein nanostructures. 
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Figure 2 Different approachies for building DNA nanostructures. The design of DNA nanostructure is based on tine assembly of several 
medium-sized oligonucleotides that form either (a) a finite sized nanostructure or (b) assembled building blocks that further oligomerize into a 
finite sized nanostructure. (c) DNA nanostructure can be assembled from a single long DNA scaffold (blue) and short oligonucleotides (red, green) 
that hold the scaffold in place, (d) 2D and 3D nanostructures can be constructed by short DNA strands, DNA bricks. 



- several medium-sized DNA oligonucleotides that as- 
semble into building blocks that further oligomerize 
into finite sized structures such as different polyhe- 
dra or into lattices [3,25]; 

- single long DNA scaffold (e.g. encompassing several 
1000 nucleotides from the single stranded DNA 
phage) that is shaped into selected structure by the 
addition of short oligonucleotide clamps a.k.a. DNA 
origami technique, invented by Paul Rothemund 
[26]. This approach can result in complex 2D or 3D 
shapes such as molecular raster images, box, sphere 
etc. [27-30]; 

- large number of short DNA bricks (32 or 42 
nucleotide long strands that form U-shaped brick) 
that fill the 2D plane or 3D space, where the se- 
lected structure is formed by the omission of appro- 
priate DNA bricks from the assembly mixture. 
Almost any 2D or 3D shape can be formed by this 
approach [15,31]. 

An important advantage of DNA-based nanostructures 
is that it is possible to address the selected positions 
within the 2D or 3D nanostructures at approximately 
5 nm resolution and introduce oligonucleotides with se- 
lected functionalities, such as different organic com- 
pounds, fluorophores, metal binding groups, proteins 
etc. into those positions, thereby functionalizing DNA 
nanostructures [9,32-36]. 

RNA has the distinct advantage that ssRNA could eas- 
ily be produced in vivo in order to promote the self- 
assembly. This property was used to prepare RNA-based 



scaffolds with attached sites for functional proteins fused 
to specific sequence RNA binding domains. While those 
in vivo assembled structures were not well characterized, 
the scaffold strongly enhanced the reaction yield [37] 
similar to the DNA-based scaffolded enzymes, where the 
arrangement of enzymes had been linear [38]. It is 
hoped that this in vivo approach will be further devel- 
oped for in vivo applications. ssDNA could also be pro- 
duced in vivo, demonstrated by the self-assembly of a 
tetrahedron [39]. Isothermal DNA nanostructure assem- 
bly strategy has been developed that could further facili- 
tate future DNA self-assembly in vivo [40] . 

DNA nanostructures were used to make devices that 
were functional in the cellular milieu; e.g. drug delivery 
container that encapsulates cargo, such as therapeutic 
antibodies, while opening of the container could be con- 
trolled by binding of the trigger signals to the aptamer 
lock that regulates opening of the container only if the 
triggering signals for both of the two locks are present 
[41]. DNA origami seems to be stable in vivo indicating 
that it is relatively protected against nucleases. There are 
also reports on the use of DNA nanostructures as the 
constituents of vaccines [42-44]. However real applica- 
tions of DNA nanostructures are at the moment quite 
rare and essentially all DNA nanostructures are prepared 
by chemical synthesis, which limits the technological ap- 
plications due to the cost and scale of production. 

Protein nanostructures 

Proteins provide masterful examples of complex self- 
assembling nanostructures with properties and functiona- 
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lities beyond the reach of any human-made materials. It is 
estimated that there are only few thousand different protein 
folds in nature, and recently the number of new determined 
protein fold basically trickled to a halt despite determination 
of tens of thousands of new protein structures each year. So 
far folds of only few small protein domains can be accurately 
predicted [45-48] and design of completely new folds with- 
out resemblance to any of the existing native folds repre- 
sents even a greater challenge [49]. 

Larger natural proteins have evolved through combi- 
nations of several smaller independently folding domains. 
Protein oligomerization based on the symmetric oligome- 
rization domains is an important source of suprastructured 
proteins [50]. Existing protein oligomerization domains have 
been recognized as suitable building blocks for the predict- 
able bottom-up design of artificial protein nanostructures. 
Strategies that used modified natural domains, or genetically 
or chemically linked secondary structure elements for self- 
assembling, and resulted in formation of symmetric inter- 
molecular protein assemblies, lattices and heterogeneous 
cage-like assemblies, are described in reviews [51-53]. 



Recently we presented a new approach where a single 
polypeptide chain composed of concatenated coiled-coil- 
forming peptides self-assembled into a new topological fold, 
asymmetric tetrahedron-like cage, which is defined and sta- 
bilized by the specific pairing of the coiled-coil-forming seg- 
ments arranged in a precisely defined order rather than 
cooperative packing of hydrophobic protein core [54]. 

Assemblies based on linked natural protein 
oligomerizing domains 

The first strategy for the creation of designed protein 
nanostructures relied on interactions between oligomeriz- 
ing protein domains which typically comprise 100-200 or 
more amino acid residues. The domains can self-assemble 
non-covalently, but specifically into larger superstructures. 
Attempts in this direction have been pioneered with fu- 
sion strategy [55]. Two different oligomerizing domains, 
one promoting dimerization and another one promoting 
homo-trimerization were linked by a semi-rigid linker 
(Figure 3a). Several copies of such a fusion protein were 
able to self-assemble into symmetric small cage-like but 
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Figure 3 Design strategies for symmetric domain-based intermolecular protein assemblies, (a) Fusion of natural oligomerizing protein 
domains. Two different oligomeric protein domains (dimerization domain (pink), trimerization domain (blue)) are genetically fused via helical 
linker (violet) to obtain a single chain building block which self-assembled into a 12-subunit cage-like structure with tetrahedral shape (4d9j) [56]. 
(b) Novel protein domain interface design. Computational design of additional interaction surfaces (red) on natural trimerization domain (blue) 
leads to the formation of 12-subunit assembly with tetrahedral - or 24-subunit assembly with octahedral symmetry (4ddf) [62]. 
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heterogeneous assemblies, or extended fibrils, depending 
on the length of the helical linker. Recent refinement of 
the original protein sequence resulted in a homogeneous 
12-subunit assembly, confirmed by X-ray crystal structure 
determination. The structure of this oligomeric nanostruc- 
ture reveals tetrahedral geometry with 16 nm diameter 
[56,57]. 

This approach provides the possibility to create smart 
bionanomaterials by regulating the assembly and disas- 
sembly. Self-assembly of the fusion protein composed of 
the dimerizing gyrase B domain and trimerization do- 
main can be driven by the addition of a small molecule. 
The addition of pseudo-dimeric gyrase B ligand, cou- 
mermycin, induced formation of hexagonal assemblies 
and its dissociation by the subsequent addition of a 
monomeric ligand novobiocin, which competes for bind- 
ing to the same gyrase B site as the pseudodimeric cou- 
mermycin [58]. 

The extended fusion strategy circumvented the problem 
of connecting two oligomerization domains in a fixed rela- 
tive orientation which assured well-ordered self-assembled 
protein nanostructures [59]. They showed that fusion pro- 
tein can be made by selecting two or more connections 
between the adjacent oligomers if the two domains are 
joined along an axis of symmetry that both oligome- 
rization domains share. However this symmetry-matching 
fusion protein strategy successfully manufactured linear 
filaments, two-dimensional lattices and large solid aggre- 
gates, but is not suitable for designing defined cage-like 
structures. 

Engineering new interaction surfaces into native 
protein domains 

In the strategies described above the range of suitable pro- 
tein domains is limited by restrictions regarding the sym- 
metry axes of the natural domains. A step further towards 
the design of artificial protein nanostructures was done by 
engineering domain surfaces for weak non-covalent interac- 
tions in the self-assembling processes. The analysis of nat- 
ural contact interfaces between protein domains disclosed 
the rules governing domain association. The contacting 
surfaces should be complementary and predominantly 
non-polar. The contribution of hydrogen bonds and salt 
bridges at the contact rim is negligible. Employing these 
rules it was demonstrated that a given protein can be engi- 
neered to form new contact interfaces that produced a 
number of novel assemblies [60]. Algorithm Rosetta for 
modeling protein-protein interactions [61] enables de novo 
design of interacting interfaces which can drive the self- 
assembly of designed proteins into a desired symmetric 
architecture [46,62]. In a recent study, a computational de- 
sign of protein nanostructures with atomic level accuracy 
was described [62]. Protein building blocks, based on nat- 
ural trimeric protein domains were docked together 



symmetrically to the target packing arrangements and low- 
energy protein-protein interaction interfaces were designed 
between building blocks in order to drive the self-assembly 
(Figure 3b). The designed proteins assembled into cage-like 
nanostructures with either tetrahedral or octahedral point 
group symmetry which was confirmed by crystal structures. 

Modular approach for de novo designed protein 
nanostructures 

The strategies employing oligomerizing protein domains 
for designing new protein structures, described above, 
are limited to homologues of known native protein folds. 
The next generation engineering approaches are based 
on modules that can be considerably smaller than the 
typical protein domain. The modules comprise interact- 
ing de novo designed secondary structure elements that 
are predictably combined with specified partners to form 
larger assemblies. De novo protein design refers to at- 
tempts to construct completely new protein sequences 
for the prescribed structures based on the principles de- 
fining the stability and selectivity of building modules; in 
de novo design the polypeptide sequence is selected by 
the designer. 

Modularity and orthogonality are two foundation con- 
cepts of de novo design and engineering of new protein 
nanostructures. Instead of optimization of the numerous 
cooperative interactions that underpin the structures of 
natural proteins, the use of well-understood structural 
modules, which could be combined into complex nano- 
structures, was proposed, a-helices and |3-strands repre- 
sent attractive protein folding motifs to serve as building 
blocks for well-ordered and defined nanostructures with 
complex architecture [63-67]. 

The most studied module for building self-assembled 
protein nanostructures are interacting helical peptides and 
particularly coiled-coils. They are ubiquitous facilitators of 
inter- and intramolecular protein-protein interactions and 
comprise two or more intertwined a-helices that are 
encoded by the characteristic heptad sequence repeat, 
where residues are labeled with abcdefg. The non-covalent 
interactions that drive the formation of coiled-coils are the 
hydrophobic effects between amino acids at positions a 
and d that form a hydrophobic core of coiled-coil, and the 
electrostatic inteactions between the opposite charged res- 
idues at positions e and g. The rules governing coiled-coil 
formation, their oligomerization state and interaction part- 
ner specificity have been considerably established over 
the last decades [68,69]. On the basis of those rules sets 
of orthogonal designed coiled-coils as the toolkit for 
the designed protein assemblies were developed [70-75]. 
Engineered coiled-coil polypeptides have been used to as- 
semble different nanomaterials: nanofibres [76,77], mem- 
branes [78], nanotubes [79], nanostructured films [80], 
spherical structures [81], responsive hydrogels [82,83], 
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spheres [84] etc. Homogeneous nanoparticles with regular 
polyhedral symmetry, about 16 nm in diameter, were pre- 
pared from single type of polypeptide chains where the 
two coiled-coil modules with different oligomerization 
states were joined by a short linker [85]. In another study 
two oligomerizing coiled-coil peptides were tethered via 
disulphide bond close to their center. The self-assembled 
molecules spontaneously curved into the spherical cage- 
like particles, with a hexagonal-pattern of the cage surface 
and about 100 nm in diameter [84]. Another example are 
discrete circular nanostructures of defined stoichiometry; 
trimers or tetramers of < 10 nm were observed when 
linker between two coiled-coil-forming segments compris- 
ing 6-10 residues. Larger colloidal-scale assemblies as well 
as flexible fibers were formed when shorter linkers limited 
flexibility between peptides [86]. 

Designed topological protein folds based on interacting 
coiled-coil modules 

Recent innovative approach to construct new engineered 
self-assembled protein nanostructures is based on the 
concatenated interacting dimerizing modules, comprise 
up to 45 amino acid residues [54]. The tetrahedral nano- 
structure was built from only single polypeptide chain; 
this strategy may appropriately be called designed protein 
origami as opposed to native protein structures that fold 
into a defined 3D structure from a single chain. 

Rather than folding the structure based on the interac- 
tions between residues in the hydrophobic core as for 
the native proteins, the modular topological design is 
based on pairwise interactions between concatenated 
secondary structure elements (coiled-coil-forming seg- 
ments), whose folding and orthogonality is engineered 
independently. Orthogonality of used coiled-coil build- 
ing modules ensures that each segment preferentially 
binds to its designated partner segment within the same 
polypeptide chain. The final topology is defined by the 
sequential order of coiled-coil segments. The topological 



fold comprises a cavity bounded by coiled-coil dimers as 
the edges of the polyhedron. This type of modular self- 
assembly therefore in many aspects resembles the prin- 
ciples of DNA nanostructures [2,3,26], where polyhedra 
had been constructed based on the complementary 
DNA segments. 

According to this approach long range non-covalent in- 
teractions occur between coiled-coil-forming segments, 
which dimerize independently of the other segments. The 
coiled-coil-forming segments are concatenated into a pre- 
cisely defined order with intervening flexible linkers be- 
tween each segment, to provide the hinge-like flexibility. 
In the case of a monomeric tetrahedron, which was con- 
structed to demonstrate the principle, the polypeptide 
chain is composed of 12 designed coiled-coil dimer- form- 
ing segments, each forming an orthogonal coiled-coil 
dimer with its partner segment within the same polypep- 
tide chain (Figure 4). In this way it forms 6 edges of a 
tetrahedron, while the flexible linkers were positioned at 
vertices. The polypeptide was produced in the recombin- 
ant form in E, coli and self-assembled by a slow dialysis or 
temperature annealing into tetrahedral structure, whose 
edges measure around 5 nm. This direction opens an ex- 
citing perspective for the creation of additional entirely 
new protein folds. The principle of protein assembly can 
benefit significantly by the application of a mathematical 
topology theory, which can be used to analyze the number 
of theoretical solutions and may be in the future applied 
to optimize the kinetics of the assembly [87]. The results 
of protein nanocage engineering show that modular de- 
sign can be used for complex structures, with the potential 
for applications biocatalysis, targeted drug delivery, vaccin- 
ation, etc. [88]. 

Conclusions and future prospects 

The recent successes in the design of new bionanostructures 
based on DNA and protein demonstrates the potentials of 
this approach to engineer new fianctional nanostructures. 
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Figure 4 Protein origami: modular topological design of protein structure from a single polypeptide chain. A toolbox for constructing 
tetrahedron-like cage comprised of six orthogonal pairs of coiled-coil-forming peptides, two antiparallel- and four parallel dimers (orientation is 
denoted by arrow). Twelve peptides were concatenated in a defined order, separated by the tetrapeptide linker. The single polypeptide chain 
served as a building block that self-assembled into monomeric and asymmetric tetrahedron-like nanostructure [54]. 
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While DNA-based nanostructures are clearly ahead of 
the designed protein nanostructures in terms of the 
complexity of the designed structures so far they lacked 
tangible applications. Although it has been demonstrated 
that DNA-based nanostructures are functional in organ- 
isms, use of in vivo produced and assembled nucleic 
acid-based nanostructures would represent an important 
step ahead both for the production cost and new bio- 
logical applications. Functionalization of nucleic acids 
could combine structural design with precisely addressed 
functionalities. However, proteins adopt much larger 
conformational variability than nucleic acids and provide 
more versatile functionality. De novo design of protein 
nanostructures has been limited to small number of ap- 
plication cases which predominatly utilizing repurposed 
natural protein domains. Nevertheless the design of pro- 
tein assemblies has matured beyond the proof of princi- 
ples and is ready to face more complex challenges. New 
emerging paradigms such as the topological protein folds 
open completely new avenues that seem not to have been 
adopted or perhaps even tested by nature. Future develop- 
ments will demonstrate the potentials of different strat- 
egies, or their combinations, with respect to the precise 
engineering of nanostructures and the theoretical limita- 
tions of different platforms. The next stage will need to 
focus on application development. The potentials are nu- 
merous, from targeted drug and biomolecule delivery, vac- 
cine design, tissue engineering, senzors design, biocatalysis 
to bionanomaterials science. The interdisciplinary ap- 
proach of synthetic biology, combining structural biology, 
molecular biology, mathematics, engineering and many 
other disciplines, have the potential to join forces in this 
exciting opportunity. 
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